Integrating Vocabularies: Discovering and Representing Vocabulary Maps

نویسنده

  • Borys Omelayenko
چکیده

The Semantic Web would enable new ways of doing business on the Web that require development of advanced business document integration technologies performing intelligent document transformation. The documents use different vocabularies that consist of large hierarchies of terms. Accordingly, vocabulary mapping and transformation becomes an important task in the whole business document transformation process. It includes several subtasks: map discovery, map representation, and map execution that must be seamlessly integrated into the document integration process. In this paper we discuss the process of discovering the maps between two vocabularies assuming availability of two sets of documents, each using one of the vocabularies. We take the vocabularies of product classification codes as a playground and propose a reusable map discovery technique based on Bayesian text classification approach. We show how the discovered maps can be integrated into the document transformation process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Changing Controlled Vocabularies

For the foreseeable future, controlled medical vocabularies will be in a constant state of development, expansion and refinement. Changes in controlled vocabularies must be reconciled with historical patient information which is coded using those vocabularies and stored in clinical databases. This paper explores the kinds of changes that can occur in controlled vocabularies, including adding te...

متن کامل

Linguistic Watermark 3.0: An RDF Framework and a Software Library for Bridging Language and Ontologies in the Semantic Web

In this paper, we present a framework for representing heterogeneous linguistic resources and for integrating their content with Semantic Web ontologies. This work, which extends and improves previous research conducted by these same authors, articulates into two main results: first, a set of coordinated RDF vocabularies providing descriptors for representing linguistic resources and their soft...

متن کامل

Vocabulary Conversion : Performance with Controlled and Uncontrolled Terms and Tags Technical

Controlled and uncontrolled indexing terminology and metadata may be converted from one to another. Decision criteria are developed that can be used to determine which terms should be assigned when converting vocabularies. Methods are developed for computing the parameters of these systems, as well as means for estimating the parameters when given limited information. These conversion technique...

متن کامل

Creating an Order in Distributed Digital Libraries by Integrating Independent Self-Organizing Maps

Digital document libraries are an almost perfect application arena for un-supervised neural networks. This because many of the operations computers have to perform on text documents are classiication tasks based on \noisy" input patterns. The \noise" arises because of the known inaccuracy of mapping natural language to an indexing vocabulary representing the contents of the documents. A growing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002